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ABSTRACT 



Recent theoretical and experimental advances have demonstrated 
that the sounds of human speech make human language an effective 
medium of communication through a process of speech "encoding." The 
presence of sounds like the language universal vowels /a/, /u/, and 
/!/ makes this process possible. In the past five years we have 
shown that the anatomic basis of human speech is species-specific. 

We have recently been able to reconstruct the supralaryngeal vocal 
tracts of extinct hominid species. These reconstructions make use 
of the methods of comparative anatomy and skeletal similarities that 
exist between extinct fossils and living primates like newborn homo 
sapiens and the nonhuman primates. Computer-implemented supralaryn- 
geal vocal tract modelling indicates that these extinct species 
lacked the anatomic ability that is necessary to produce the range 
of sounds that is necessary for hutan speech. Human linguistic 
ability depends, in part, on the gradual evolution of modern man’s 
supralaryngeal vocal tract. Species like "classic" Neanderthal man 
undoubtedly had language, but their linguistic ability was markedly 
Inferior to modern man’s. 
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Human language is one of the defining characteristics that differentiate 
modern man from all other animals. The traditional view concerning the 
uniqueness of human linguistic ability is that it is based on man’s mental 
processes (Lenneburg, 1967). In other words the "uniqueness" of human lan- 
guage is supposed to be entirely due to the properties of the human brain. 

The particular sounds that are employed in human language are therefore often 
viewed as an arbitrary, fortuitously determined set of cipher-like elements. 

Any other set of sounds or gestures supposedly would be just as useful at the 
communicative, l.e., the phonetic, level of human language. 

The results of recent research have, however, challenged this view. The 
"motor theory" of speech perception that has been developed over the past fif- 
teen years, in essence, states that speech signals are perceived in terms of 
the constraints that are imposed by the human vocal apparatus (Liberman et al., 
1967). Other recent research, which I will attempt to summarize in this 
paper, indicates that the anatomic basis of human speech production is Itself 
species-specific. This research is the product of a collaborative effort in- 
volving many skills. Edmund S. Crelln of the Yale University School of 
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Medicine, Dennis H. Klatt of Peter Wolff of Harvard University, and 

my colleagues at the University of Connecticut and Haskins Laboratories have 
all been Involved at one time or another* Our research indicates that the 
anatomic basis of human speech production is the result of. a long evolution- 
ary process in which the Darwinian process of natural selection acted to re- 
tain mutations that would enhance rapid commuMlcatlon through the medium of 
speech. The neural processes that are Involved in the perception of speech 
and the unique species-specific aspects of the human supralaryngeal vocal 
tract furthermore appear to be Interrelated in a positive way. 

Vocal Tract Reconstruction 

The most direct approach to this topic is to start with our most recent 
experimental technique, the reconstruction and functional modelling of the 
speech-producing anatomy of extinct fossil homlnlds. We have been able to 
reconstruct the evolution of the human supralaryngeal vocal tract by making 
use of the methods of comparative anatomy and skeletal similarities that exist 
between extinct fossil homlnlds and living primates (Lleberman and Crelln, 

1971). In Figure 1 inferior views of the base of the skull are shown for new- 
born modern man, a reconstruction of the fossil La Chappelle-aux-Salnts Nean- 
derthal man, and an adult modem man. The detailed morphology of the base of 
the skull and mandible, which is similar in newborn modern man and Neander- 
thal man. forms the basis for the Neanderthal reconstruction. Some of the 
skull features that are similar in newborn modem man and Neanderthal man, but 
different from adult modern man, are as follows: (1) the skulls have a generally 
flattened out base; (2) they lack a chin; (3) the body of the mandible is 
60 to 100 percent longer than the ramus; (4) the posterior border of the man- 
dibular ramus is markedly slanted away from the vertical plane; (5) there is 
a more horizontal inclination of the mandibular foramen leading to the mandib- 
ular canal; (6) the pterygoid process of the sphenoid bone is relatively 
short and its lateral lamina is more inclined away from the vertical plane; 

(7) the styloid process is more inclined away from the vertical plane; (8) 
the dental arch of the maxilla is U-shaped Instead of V-shaped; (9) the 
basilar part of the occipital bone between the foramen magnum and the sphe- 
noid bone is only slightly inclined away from the horizontal toward the ver- 
tical plane; (10) the roof of the nasopharynx is a relatively shallow elon- 
gated arch; (11) the vomer bone is relatively short in its vertical height 
and its posterior border is inclined away from the vertical plane; (12) tne 
vomer bone is relatively far removed from the junction of the sphenoid bone 
and the basilar side part of the occipital bone; (13) the occipital condyles 
are relatively small and elongated. These similarities are in accord with 
other skeletal features typical of Neanderthal fossils (Vl8ek, 1970), which 
may be seen in the course of the ontogenetic development of modern man. This, 
parenthetically, does not mean that Neanderthal man was a direct ancestral 
form of modern man since Neanderthal fossils exhibit specializations like 
brow ridges that never occur in the ontogenetic development of modern man. 

Modern man, furthermore, deviates quite drastically from Neanderthal man in 
the course of normal maturation from the newborn state. 

In Figure 2 lateral views of the skull, vertebral column, and larynx of 
newborn and adult modern man and Neanderthal man are presented. The signifi- 
cance of the aforementioned skeletal features with regard to the supralaryngeal 
vocal tract can be seen in the high position of the larynx in nev/bom and in 
Neanderthal. 
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ter Lieberman and Crelin, 1971.) 





Skull, Vertebral Column, and Larynx 
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In Figure 3 the supralaryngeal air passages of newborn and adult man and 
the Neanderthal reconstruction are diagrammed so that they appear equal in 
size. Although the nasal and oral cavities of Neanderthal are actually larger 
than those of adult modern man, they are quite similar in shape to those of 
the newborn. The long "flattened out" base of the skull in newborn and Nean- 
derthal is a concomitant skeletal correlate of a supralaryngeal vocal tract 
in which the entrance to the pharynx lies behind the entrance to the larynx. 

In the ontogenetic development of adult modern man the opening of the larynx 
into the pharynx shifts to a low position. In this shift the epiglottis be- 
comes widely separated from the soft palate. The posterior part of the tongue, 
between the foramen cecum and the epiglottis, shifts from a horizontal resting 
position within the oral cavity to a vertical resting position, to form the 
anterior wall of the oral part of the pharynx (Figure 3C). In this shift the 
epiglottis becomes widely separated from the soft palate. 

The uniqueness of the adult human supralaryngeal vocal tract rests in the 
fact that the pharynx and oral cavities are almost equal in length and are at 
right angles. No other animal has this "bent" supralaryngeal vocal tract in 
which the cross-sectional areas of the oral and pharyngeal cavities can be 
Independently modified. The human vocal tract can, in effect, function as a 
"two tube" acoustic filter. In Figure 4 we have diagrammed the "bent" human 
supralaryngeal vocal tract in the production of the "extreme," "point" vowels 
/i/, /a/, and /u/. Note that the midpoint area function changes are both 
extreme and abrupt. Abrupt discontinuities can be formed at the midpoint 
"bend." In Figure 5 the nonhuman "straight" vocal tract which is typical of 
all living nonhuman primates (Lleberman, 1968; Lleberman et al., 1969, and 
Lieberman et al., in press), newborn humans (Lleberman et al., 1968), and Nean- 
derthal man, is diagrammed as it approximates these vowels. All area function 
adjustments have to take place in the oral cavity in the nonhuman supralaryngeal 
vocal tract. Although midpoint constrictions obviously can be formed in the 
nonhuman vocal tract, they cannot be both extreme and .abrupt. The elastic 
properties of the tongue prevent it from forming abrupt discontinuities at 
the midpoint of the oral cavity. 

Vocal Tract Modelling 

Human speech is essentially the product of a source, the larynx for vowels, 
and a supralaryngeal vocal tract transfer function. The supralaryngeal vocal 
tract In effect filters the source (Chiba and Kajlyama, 1958; Fant, 1960). 

The activity of the larynx determines the fundamental frequency of the vowel, 
whereas its formant frequencies are the resonant modes of the supralaryngeal 
vocal tract. The formant: frequencies are determined by the area function of 
the supralaryngeal vocal tract. Man uses his articulators (the tongue, lips, 
mandible, pharyngeal constrictors, etc.) to modify dynamically in time the 
formant frequency patterns that the supralaryngeal vocal tract Imposes on the 
speech signal. The. phonetic Inventory of a language is therefore limited by 
(1) the number of source function modifications that a speaker is capable of 
controlling during speech communication and (2) the number of formant fre- 
quency patterns available by changing the supralaryngeal area function through 
the dynamic manipulation of the articulators. We thus can assess the contri- 
bution of the supralaryngeal vocal tract to the phonetic abilities of a homlnid, 
Independent of the source characteristics. A computer-implemented model of a 
supralaryngeal vocal tract (Henke, 1966) can be used to determine the possible 
contribution of the vocal tract to the phonetic repertoire. We can conveniently 
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Schematic Diagram of the "Bent" Human Supralaryngeal Vocal Tract 
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Note that abrupt and extreme discontinuities In cross>*sectional area 
can occur at the midpoint. 



7 





120 



8 



Note that abrupt midpoint constrictions cannot be formed. 



begin to determine whether a nonhuman supralaryngeal vocal tract can produce 
the range of sounds that occur in human language by exploring its vowel-produc- 
ing ability. Consonantal vocal tract configurations can also be modelled. It 
is, however, reasonable to start with vowels since the production of consonants 
may also involve rapid, coordinated articulatory maneuvers and we can only 
speculate on the presence of this ability in fossil hominids. 

In Figure 6 we have presented area functions of the supralaryngeal vocal 
tract of Neanderthal man that were modelled on the computer. These area func- 
tions were directed towards best approximating the human vowels /i/, /a/, and 
/u/. Our computer modelling (Lieberman and Crelin, 1971) was guided by the 
results of X-ray motion pictures of speech production, swallowing, and respi- 
ration in adult human (Haskins Laboratories, 1962; Perkell, 1969) and in new- 
born (Truby et al., 1965). This knowledge plus the known comparative anatomy 
of the living primates allowed a fairly "conservative” simulation of the vowel- 
producing ability of classic Neanderthal man. We perhaps allowed a greater 
vowel-producing range for Neanderthal man since we consistently generated area 
functions that were more human-like than ape-like whenever we were in doubt. 
Despite these compensations the Neanderthal vocal tract cannot produce /i/ , 

/a/, or /u/ . 

In Figure 7 the formant frequency patterns calculated by the computed 
program for the numbered area functions of Figure 6 are plotted. The labelled 
loops are derived from the Peterson and Barney (1952) analysis of the vowels 
of American-English of 76 adult men, adult women, and children. Each loop 
encloses the data points that accounted for 90 percent of the samples in each 
vowel category. We have compared the formant frequencies of the simulated 
Neanderthal vocal tract with this comparatively large sample of human speakers 
since it shows that the speech deficiencies of the Neanderthal vocal tract 
are different in kind from the differences that characterize human speakers. 
Since all human speakers can inherently produce all the vowels of American- 
English, we have established that the Neanderthal phonetic repertoire is 
inherently limited. In some instances we generated area functions that would 
be human-like, even though we felt that we were forcing the articulatory limits 
of the reconstructed Neanderthal vocal tract (e.g., area functions 3, 9, and 
13). However, even with these articulatory gymnastics the Neanderthal vocal 
tract could not produce the vowel range of American-English. 

Fun c tlonal Phonetic Limitations 



There are some special considerations that follow from the absence of 
the vowels /i/» /a/, and /u/ from the Neanderthal phonetic repertoire. Phonetic 
analyses have shown that these "point” vowels are the limiting articulations 
of a vowel triangle that is almost language universal (Troubetzkoy , 1939). 

The special nature of /i/, /a/, and /u/ can be argued from theoretical grounds 
as well. Employing simplified and Idealized area functions (similar to those 
sketched in Figure 4) Stevens (1969) has shown that these articulatory con- 
figurations (1) are acoustically stable for small changes in articulation and 
therefore require less precision in articulatory control than similar adjacent 
articulations and (2) contain a prominent acoustic feature, i.e. , two formants 
that are in close proximity to form a distinct energy concentration. 

The vowels /i/, /a/, and /u/ have another unique acoustical property. 

They are the only vowels in which an acoustic pattern can be related to a 
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Formant Frequencies Calculated by Computer Program for Neanderthal Reconstruction 




The numbers refer to area functions in Figure 6. (After Lleberman and Crelln, 1971.) 
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unique vocal tract area function. Other "central" vowels can be produced by 
means of several alternate area functions (Stevens and House, 1955). A human 
listener, when he hears a syllable that contains a token of /i/, /a/, and /u/ , 
can calculate the size of the supralaryngeal vocal tract that was used to 
produce the syllable. The listener, in other words, can tell whether a speaker 
with a large or small vocal tract is speaking. This is not possible for other 
vowels since a speaker with a small tract can, for example, by increasing the 
degree of lip rounding, produce a token of /U/ that would be consistent with 
a larger vocal tract with less lip rounding. These uncertainties do not 
exist for /i/, /a/ , and /u/ since the required discontinuities and constrictions 
n e supralaryngeal vocal tract area functions produce acoustic patterns that 
are beyond the range of compensatory maneuvers. 



Speech Perception and Speech Anatomy 

We noted, at the start of this paper, that the results of perceptual 
research have demonstrated that human listeners perceive speech in terms of 
the constraints imposed by the speech-producing apparatus. This mode of per- 
ception, which has been termed the "speech" or "motor" theory mode of per- 

rapid rate of information transfer of human speech possible 
(.Liberman, 1970). Human listeners can perceive as many as 30 phonetic seg- 
ments per second in normal speech. This information rate far exceeds the 
temporal resolving power of the human auditory system. It is, for example, 
impossible even to count simple pulses at rates of 20 pulses per second. The 
pulses merge into a continuous tone. Human speech achieves its high infor- 
mation rate by means of an "encoding" process that is structured in terms of 
the anatomic and articulatory constraints of speech production. The motor 
theory of speech perception, in essence, explicates this process. The pre- 
sence of vowels like /i/, /a/, and /u/ appears to be one o£ the anatomic 
factors that makes this encoding process possible. 



In Figure 8 we have reproduced two simplified spectrographic patterns 
converted to sound, produce approximations to the syllables 
/di/ and /du/ (Liberman, 1970). The dark bands on these patterns represent 
the first- and second-formant frequencies of the supralaryngeal vocal tract 
as functions of time. Note that the formants rapidly move through a range 
o requencies at the left of each pattern. These rapid movements, which 
occur in about 50 msec, are called transitions. The transition in the second 
tormnt, which is encircled, conveys the acoustic information that human 
listeners interpret as a token of a /d/ in the syllables /di/ and /du/. It 
impossible to isolate the acoustic pattern of /d/ in these 
syllables. If tape recordings of these two syllables are "sliced" with the 
electronic equivalent of a pair of scissors, it is impossible to find a seg- 
ment that contains only /d/. There is no way to cut the tape so as to obtain 
a piece that will produce /d/ without also producing the next vowel or some 
reduced approximation to it. 



Tf encircled transitions are different for the two syllables. 

If these encircled transitions are isolated, listeners report that they hear 
either an upgoing or a falling frequency modulation. In context, with the 

syllable, these transitions cause listeners 
to hear an identical sounding /d/ in both syllables. How does a human 
listener effect this perceptual response? 
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Simplified Spectrograph ic Patterns 
Sufficient to Produce the Syllables /di/ and /du/ 
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The circles enclose the second formant frequency transitions 
(After Liberman, 1970.) 
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We have noted the formant frequency patterns of speech reflect the res- 
onances of the supralaryngeal vocal tract. The formant patterns that define 
the syllable /dl/ in Figure 8 thus reflect the changing resonant pattern of 
the supralaryngeal vocal tract as the speaker moves his articulators from 
the occlusion of the tongue tip against the palate that Is Involved In the 
production of /d/ to the vocal tract configuration of the /!/. A different 
acoustic pattern defines the /d/ In the syllable /du/. The resonances of the 
vocal tract are similar as the speaker forms the Initial occlusion of the /d/ 
In both syllables; however, the resonances of the vocal tract are quite dif- 
ferent for the final configurations of the vocal tract for /!/ and /u/. The 
formant patterns that convey the /d/ In both syllables are thus quite differ- 
ent since they Involve transitions from the same starting point to different 
end points. Human listeners "hear" an Identical Initial /d/ segment In both 
of these signals because they "decode" the acoustic pattern In terms of the 
articulatory gestures and the anatomical apparatus that Is Involved In the 
production of speech. The listener In this process, which has been termed 
the "motor theory of speech perception" (Liberman et al. , 1967), operates In 
terms of the acoustic pattern of the entire syllable. The acoustic cues for 
the Individual "phonetic segments" are fused Into a syllabic pattern. The 
high rate of Information transfer of human speech Is thus due to the trans- 
mission of acoustic Information In syllable-sized units. The phonetic ele- 
ments of each syllable are "encoded" Into a single acoustic pattern which Is 
then "decoded" by the listener to yield the phonetic representation. 

In order for the process of "motor theory perception" to work the lis- 
tener must be able to determine the absolute size of the speaker's vocal 
tract. Similar articulatory gestures will have different acoustic correlates 
in dlfferent-slzed vocal tracts. The frequency of the first formant of /a/, 
for example, varies from 730 to 1030 Hz In the data of Peterson and Barney 
(1952) for adult men and children. The frequencies of the resonances that 
occur for various consonants likewise are a function of the size of the 
speaker's vocal tract. The resonant pattern that is the correlate of the 
consonant /g/ for a speaker with a large vocal tract may overlap with the 
resonant pattern of the consonant /d/ for a speaker with a small vocal tract 
(Rand, 1971). The listener therefore must be able to deduce the size of the 
speaker's vocal tract before he can assign an acoustic signal to the correct 
consonantal or vocalic class. 

There are a number of ways In which a human listener can Infer the size 
of a speaker's supralaryngeal vocal tract. He can, for example, note the 
fundamental frequency of phonatlon. Children, who have smaller vocal tracts, 
usually have higher fundamental frequencies than adult men or adult women. 
Adult men, however, have disproportionately lower fundamental frequencies 
than adult women (Peterson and Barney, 1952), so fundamental frequency Is not 
an infallible cue to vocal tract size. Perceptual experiments (Ladefoged and 
Broadbent, 1957) have shown that human listeners can make use of the formant 
frequency range of a short passage of speech to arrive at an estimate of the 
size of a speaker's vocal tract. Recent experiments, however, show that hu- 
man listeners do not have to defer their "motor theory" decoding of speech 
until they hear a two- or three-second interval of speech. Instead, they 
use the vocalic information encoded In a syllable to decode the syllable 
(Darwin, in press; Rand, 1971). This may appear to be paradoxical, but It 
Is not. The listener, makes use of the formant frequencies and fundamental 



frequency of the syllable's vowel to assess the size of the vocal tract that 
produced the syllable. We have noted throughout this paper that the vowels 
/a/ , /i/ , and /u/ have a unique acoustical property. The formant frequency 
pattern for these vowels can always be related to a unique vocal tract size 
and shape. A listener, when he hears one of these vowels, can thus Instantly 
determine the size of the speaker's vocal tract. The vowels /a/, /i/, and 
/u/ (and the glides /y/ and /w/) thereby serve as acoustic calibration sig- 
nals in human speech. 

The absence of a human-like pharyngeal region in apes, newborn man, and 
Neanderthal man is quite reasonable. The only function that the human supra- 
laryngeal vocal tract is better adapted to la speech production, in particu- 
lar the production of vowels like /a/, /i/, and /u/. The human supralaryn- 
geal vocal tract is otherwise less well adapted for the primary vegetative 
functions of respiration, chewing, and swallowing (Lieberman et al., 1971; 
Crelin et al., forthcoming). This suggests that the evolution of the human 
vocal tract which allows vowels like /a/, /i/, and /u/ to be produced and the 
universal occurrence of these vowels in human languages reflect a parallel de- 
velopment of the neural and anatomic abilities that are necessary for lan- 
guage. This parallel development would be consistent with the evolution of 
other human abilities. The ability to use tools depends, for example, both on 
upright posture and an opposable thumb, and on neural ability. 

Neanderthal man lacked the vocal tract that is necessary to produce the 
human "vocal tract size-calibrating" vowels /a/, /i/, and /u/. This suggests 
that the speech of Neanderthal man did not make use of syllabic encoding. 

While communication is obviously possible without syllabic encoding, studies 
of alternate methods of communication in modern man show, as we noted before, 
that the rate at which information can be transferred is about one— tenth that 
of normal human speech. 

It is imperative to note that classic Neanderthal man, as typified by 
fossils whose skull bases are similar to the La Chapelle-aux-Salnts, La Fer- 
rasle. La Qulna, Pech-de-L'Az^ , and Monte Clrceo fossil homlnlds (as well as 
many others), probably does not represent the mainstream of human evolution. 
Although Neanderthal man and modern man probably had a common ancestor, ’ 
Neanderthal represents a divergent species (Boule and Vallois, 1957; VlXek, 
1970; Lieberman and Crelin, 1971). In Figure 9 we have photographed a cast- 
ing of a reconstruction of the fossil Stelnhelm calvarium with the mandible 
of the La Chapelle-aux-Salnts fossil. The mandible of the Stelnhelm fossil 
homlnld never was found. Note that the La Chapelle-aux-Saints mandible is 
too long. In Figure 10 the Stelnhelm fossil has been fitted with a mandible 
from a normal adult human, which best "fits" the Stelnhelm fossil. We are in 
the process of reconstructing the supralaryngeal vocal tract of the Stelnhelm 
fossil (Crelin et al. , forthcoming). It is quite likely that this fossil, 
which Is approximately 300,000 years old, had the vocal tract anatomy that is 
necessary for human speech. The evolution of the anatomical basis for human 
speech thus would not appear to be the result of abrupt, recent change in the 
morphology of the skull and soft tissue of the vocal tract. We have noted a 
number of fossil forms that appear to represent intermediate stages in the 
evolution of the vocal tract. Recent fossil discoveries Indicate that the 
evolution of the human vocal tract may have started at least 2.6 million years 
ago. It, therefore, is not surprising to find that the neural aspects of 
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Reconstructed Steinheim Clavarium with Neanderthaloid Mandible 
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Note that the Neanderthal mandible is too large. (After Crelin et al,, forthcoming. 






Reconstructed Steinheim Clavarium with a Modern Human Mandible 







Fig. 10 
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This represents the best "fit.” (After Crelin et al. , forthcoming.) 





speech perception are matched to the anatomical aspects of speech production. 
Nor should we be surprised to note that "naturalness" constraints relate the 
phonetic and phonologic levels of grammar (Jakobson et al. , 1952; Postal, 
1968; Chomsky and Halle, 1969). 

Sir Arthur Keith many years ago speculated on the antiquity of man. We 
now know that homlnld evolution can be traced back at least 3 million years. 
The evolution of phonetic ability appears to have been an Integral part of 
this evolutionary process. It nay have its origins at the very beginnings 
of homlnld evolution. 
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